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We investigate the predictability of extreme events in time series. The focus of this work is 
to understand under which circumstances large events are better predictable than smaller events. 
Therefore we use a simple prediction algorithm based on precursory structures which are identified 
using the maximum likelihood principle. Using the receiver operator characteristic curve as a mea- 
sure for the quality of predictions we find that the dependence on the event magnitude is closely 
linked to the probability distribution function of the underlying stochastic process. We evaluate this 
dependence on the probability distribution function analytically and numerically. If we assume that 
the optimal precursory structures are used to make the predictions, we find that large increments 
are better predictable if the underlying stochastic process has a Gaussian probability distribution 
function, whereas larger increments are harder to predict if the underlying probability distribution 
function has a power law tail. In the case of an exponential distribution function we find no signif- 
icant dependence on the event magnitude. Furthermore we compare these results with predictions 
of increments in correlated data, namely, velocity increments of a free jet flow. The velocity incre- 
ments in the free jet flow are in dependence on the time scale either asymptotically Gaussian or 
asymptotically exponential distributed. The numerical results for predictions within free jet data 
are in good agreement with the previous analytical considerations for random numbers. 
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I. INTRODUCTION 

Systems with a complex time evolution, which gener- 
ate a great impact event from time to time, are ubiqui- 
tous. Examples include fluctuations of prices for finan- 
cial assets in economy with rare market crashes, electrical 
activity of human brain with rare epileptic seizures, seis- 
mic activity of the earth with rare earthquakes, changing 
weather conditions with rare disastrous storms, and also 
fluctuations of online diagnostics of technical machinery 
and networks with rare breakdowns or blackouts. Due 
to the complexity of the systems mentioned, a complete 
modeling is usually impossible, either due to the huge 
number of degrees of freedom involved, or due to a lack 
of precise knowledge about the governing equations. This 
is why one applies the framework of prediction via pre- 
cursory structures for such cases. The typical application 
for prediction with precursory structures is a prediction 
of an event which occurs in the very near future, i.e., 
on short timescales compared to the lifetime of the sys- 
tem. A classical example for the search forprecursory 
structures is the prediction of earthquakes A more 
recently studied example is the short term prediction of 
strong turbulent wind gusts, which can destroy wind tur- 
bines 0,11. 

In a previous work [3], we studied the quality of pre- 
dictions analytically via precursory structures for incre- 
ments in an AR(1) process and numerically in a long- 
range correlated ARMA process. The long-range corre- 
lations did not alter the general findings for Gaussian pro- 
cesses, namely, that larger events are better predictable. 

Furthermore we found other works which report the 



same effect for earthquakeprediction prediction of 
avalances in SOC-models [a| and in multiagent games 
0. In this contribution, we investigate the influence of 
the probability distribution function (PDF) of the noise 
term in detail by using not only Gaussian, but also expo- 
nential and power-law distributed noise. This approach 
is also motivated by the book of Egans [Sj] which explains 
that receiver operator characteristics (ROC) obtained in 
signal detection problems can be ordered families of func- 
tions in dependence on a parameter. We are now inter- 
ested in learning how the behavior of these families of 
functions depends on the event magnitude and the dis- 
tribution of the stochastic process, if the ROC curve is 
used for evaluating the quality of predictions. 
After defining the prediction scheme in Sec. Ill Al and the 
method for measuring the quality of a prediction in Sec. 
IIIBI we explain in Sec. Ill CI how to consider the influ- 
ence on the event magnitude. In Sec. Ill Dl we formulate a 
constraint, which has to be fulfilled in order to find a bet- 
ter predictability of larger (smaller) events. In the next 
section, we apply this constraint to compare the quality 
of predictions of large increments within Gaussian (Sec. 
nil A[) . exponential distributed (Sec lIIIBjl and power-law 
distributed i.i.d. random numbers (Sec. lIII C]) . We study 
the prediction of increments in free jet data in Sec. IIVI 
Conclusions appear in Sec. |Vl 



II. DEFINITIONS AND SETUP 

The considerations in this section are made for a time 
series d, i-e. , a set of measurements Xn at discrete 
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times t„, where tn = to + nA with a samphng interval 
A and n G N. The recording should contain sufficiently 
many extreme events so that we are able to extract sta- 
tistical information about them. We also assume that 
the event of interest can be identified on the basis of the 
observations, e.g. by the value of the observation func- 
tion exceeding some threshold, by a sudden increase, or 
by its variance exceeding some threshold. We express the 
presence (absence) of an event by using a binary variable 

Yn+l- 



Yn+1 — 



1 an event occurred at time 
n+1 

no event occurred at time 
n+1 



(1) 



A. The choice of the precursor 

When we consider prediction via precursory structures 
(precursors, or predictors) , we are typically in a situation 
where we assume that the dynamics of the system under 
study has both, a deterministic and a stochastic part. 
The deterministic part allows one to assume that there is 
a relation between the event and its precursory structure 
which we can use for predictive purposes. However, if 
the dynamic of the system was fully deterministic there 
would be no need to predict via precursory structures, 
but we could exploit our knowledge about the dynamical 
system as it is done, e.g., in weather forecasting. 

In this contribution we focus on the influence of the 
stochastic part of the dynamics and assume therefore a 
very simple deterministic correlation between event and 
precursor. The presence of this stochastic part determines 
that we cannot expect the precursor to preced every in- 
dividual event. That is why we define a precursor in this 
context as a data structure which is typically preceding 
an event, allowing deviations from the given structure, 
but also allowing events without preceeding structure. 

For reasons of simplicity the following considerations 
are made for precursors in real space, i.e., structures in 
the time series. However, there is no reason not to apply 
the same ideas for precursory structures, which live in 
phase space. 

In order to predict an event Yn+i occurring at the time 
(n + 1) we compare the last k observations, to which we 
will refer as the precursory variable 

X(„-fe+l,„) = {Xn-k+l,Xn-k+2, ■■■,Xn-l,Xn) (2) 

with a specific precursory structure 

^pre _ l^pre pre pre pre\ (o\ 

Once the precursory structure Xp^e is determined, we give 
an alarm for an event = 1 when we find the precur- 
sory variable :x.(^n-k+i,n) inside the volume 



There are different strategies to identify suitable precur- 
sory structures. We choose the precursor via maximizing 
a conditio nal p robability which we refer to as the lilieli- 
hood [Tl|. [if The likchhood 

, I . j(^n+l = l,X(„_j,+ i „)) 

L{Yn+l ^ l\X(„-k+l,n)) = 7 ^ (5) 

P[^{n-k+l,n)) 

provides the probability that an event = 1 follows 

the precursor X(^n-k+i,n) ■ K can be calculated numeri- 
cally by using the joint PDF j((F„+i = 1), X(„_fe+i „)). 
Our prediction strategy consists of determining those val- 
ues of each component Xi of X(„_fc_|_i „) for which the 
likelihood is maximal. 

This strategy to identify the optimal precursor repre- 
sents a rather fundamental choice. In more applied ex- 
amples one looks for precursors which minimize or max- 
imize more sophisticated quantities, e.g., discriminant 
functions or loss matrices. These quantities are usually 
functions of the posterior PDF or the likelihood, but they 
take into account the additional demands of the specific 
problem, e.g., minimizing the loss due to a false predic- 
tion. The strategy studied in this contribution is thus 
fundamental in the sense that it enters into many of the 
more sophisticated quantities which were used for pre- 
dictions and decision making. 



B. Testing for predictive power 

A common method to verify a hypothesis or to test the 
quality of a prediction is the receiver operating charac- 
teristic curve (ROC) 0, [H El. The idea of the ROC 
consists simply of comparing the rate of correctly pre- 
dicted events Vc with the rate of false alarms by plot- 
ting Tc vs. rf. The rate of correct predictions Tc and 
the rate of false alarms r/ can be obtained by integrat- 
ing the aposterior PDFs p{^(n-k+i,n)\Yn+i = 1) and 
p(x(„_j.^i „) |y„+i = 0) on the precursory volume. 



ypre^^^^^pre-^ = ]J 



j—ri — k-\-l 



pre ^ pre , ^ 

-2'^.- +2 



(4) 



r,(<5,xP-) = y p(x(„_fe+i,„)|r„+i = l)dyf-(<5,xf-) 

(6) 

ry(5,xP-) = |p(x(„_fe+i,„)|y„+i=0)dy''^^(<5,xf-) 

(7) 

Note that these rates are defined with respect to the to- 
tal numbers of events l^n+i = 1 and nonevents Yn+i = 0. 
Thus the relative frequency of events has no direct in- 
fluence on the ROC, unlike on other measures of pre- 
dictability, as e.g., the Brier score or the ignorance 14. 

Plotting Vc ys rf for increasing values of 6 one obtains 
a curve in the unit-square of the rf-rc plane (see, e.g.. 
Fig. [3]) . The curve approaches the origin for (5 ^ and 
the point (1,1) in the limit 6 oo, where S accounts 
for the magnitude of the precursory volume VpreiS). The 
shape of the curve characterizes the significance of the 
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prediction. A curve above the diagonal reveals that the 
corresponding strategy of prediction is better than a ran- 
dom prediction which is characterized by the diagonal. 
Furthermore we are interested in curves which converge 
as fast as possible to 1, since this scenario tells us that 
we reach the highest possible rate of correct prediction 
without having a large rate of false alarms. 

That is why we use the so-called likelihood ratio as a 
summary index, to quantify the ROC. For our inference 
problems the likelihood ratio is identical to the slope m of 
the ROC-curve at the vicinity of the origin which implies 
(5^0. This region of the ROC is in particular inter- 
esting, since it corresponds to a low rate of false alarms. 
The term likelihood ratio results from signal detection 
theory. In the context of signal detection theory, the 
term a posterior PDF refers to the PDF, which we call 
likelihood in the context of predictions and vice versa. 
This is due to the fact that the aim of signal detection 
is to identify a signal which was already observed in the 
past, whereas predictions are made about future events. 
Thus the "likelihood ratio" is in our notation a ratio of 
a posterior PDFs. 



Ar, ^ p(xP'-^|y„+i = 1) 
Arf ^ p(xP'''^|r„+i = 0) 



OiS). 



(8) 



However, we will use the common name likelihood ratio 
throughout the text. For other problems the name like- 
lihood ratio is also used for the slope at every point of 
the ROC. Since we apply the likelihood ratio as a sum- 
mary index for ROC, we specify, that for our purposes 



the term likelihood ratio refers only to the slope of the 
ROC curve at the vicinity of the origin as in Eq. ([8]). 

Note, that one can show that the precursor, which 
maximizes the likelihood as explained in Sec. Ill Al also 
maximizes the m and is in this sense the optimal precur- 
sor. 



C. Addressing the dependence on the event 
magnitude 

We are now interested in learning how the predictabil- 
ity depends on the event magnitude rj which is measured 
in units of the standard deviation of the time series under 
study. Thus the event variable becomes dependent 
on the event magnitude 



an event of magnitude rj or larger 
occurred at time n + 1 
no event of magnitude ri or larger 
occurred at time n + 1 



(9) 



Via Bayes' theorem the likelihood ratio can be ex- 
pressed in terms of the likelihood L(Yn+i{ri) = l|xpre) 
and the total probability to find events P(Yn+i{ri) — l). 
Inserting the technical details of the calculation of the 
likelihood and the total probability (see the appendix) 
we can see that the likelihood ratio depends sensitively 
on the joint PDF j(x(„_fe+i_„),r„+i(?/) = 1) ofpecursory 
variable and event. 



J 



"^(l^„+l(^7),X(„_fc+l,„)) 



with 



and 



j"(x(„_fc+i,„),y„+i(r)) = l) 

p(X(„_fc + i_„)) 



P(X(„_fc + l,„)) 

i(X(„_fe+i_„),y„+i(?7) = 1) = / dXn+1 j(x(n-k+l.n),Xn+l), 

Jm 

M - {Xn+l : Yn+l = 1}, (10) 

p(x(„-fe+i.„)) = j(x(„_fe+i,„),F„+i(?7) = 1) + j(x(„_fe+i^„),r„+i(?7) = 0). 

I 



Hence once the precursor is chosen, the dependence on 
the event magnitude 77 enters into the likelihood ratio, 
via the joint PDF of event and precursor. Looking at 
the rather technical formula in Eq. (fTUl) . there are two 
aspects, which we find remarkable: 

(I) The slope of the ROC curve is fully characterized 
by the knowledge of the joint PDF of precursory 
variable and event. This implies that in the frame- 
work of statistical predictions all kinds of (long- 
range) correlations which might be present in the 
time series influence the quality of the predictions 
only through their influence on the joint PDF. 



(II) The definition of the event, e.g., as a threshold 
crossing or an increment does change this depen- 
dence only insofar as it enters into the choice of the 
precursor and it influences also the set on which the 
integrals in Eq. pil)l are carried out. Both 1^+1(77) 
and the set M have to be defined according to the 
type of events one predicts. When predicting, e.g., 
increments Xn+i — Xn > "T] via the precursory vari- 
able Xn, then ^A — [a, b] with a(y„+i(77)) = a;„ + 77 
for the lower border and 6(y„+i(?7)) = 00 for the 
upper border. In order to predict threshold cross- 
ings at Xn+i via x„ one uses a(F„+i(77)) = ry. 
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Exploiting Eq. pO]) we can hence determine the depen- 
dence of the hkelihood ratio and the ROC curve on the 
events magnitude 77, via the dependence of the joint PDF 
of the process under study. 

D. Constraint for increasing quality of predictions 
with increasing event magnitude 

In order to study the dependence of the Ukehhood ra- 
tio on the event magnitude we are going to introduce a 
constraint which the Ukehhood and the total probabil- 
ity to find events have to fulfill in order to find a better 
predictability of larger (smaller) events. 

In order to improve the readability of the paper, we will 
first introduce the following notations for the aposterior 



J 



c(?7,X(„_fc+i_„)) = ^lnL(77,X(„_fc+i^„)) 



PDFs, the likelihood and the total probability to find 
events 



P/(x(„-fc+l,„),??) 
i(»y,X(„_fc+l,n)) 

Piv) 



P(x(„_fc+i,„) 1^+1(77) = 1), (11) 
p(x(„_fc+i,„)|F„+i(77) = 0), (12) 
L{Y„ 

P{Yr 



i{f]) = l|x(„_fc+i_„)),(13) 
i(^) = !)• (14) 



We can then ask for the change of the likelihood ratio 
with changing event magnitude 77. 



d 



< 



(15) 



The derivative of the likelihood ratio is positive (negative, 
zero), if the following sufficient condition 0(77) is fulfilled. 



(l-£(r/,X(„_fc+i,„)))) d X > „ 



r 



(16) 



Hence one can tell for an arbitrary process, if extreme 
events are better predictable, by simply testing, if the 
marginal PDF of the event and the likelihood of event 
and precursor fulfill Eq. 



III. PREDICTIONS OF INCREMENTS IN I.I.D. 
RANDOM NUMBERS 

In this section we test the condition 0(77, X(-„_;;^i „)) 
as given in Eq. (|16p for increments in Gaussian, power- 
law, and exponentially distributed i.i.d. random numbers. 
We thus concentrate on extreme events which consist of 
a sudden increase (or decrease) of the observed variable 
within a few time steps. Examples of this kind of extreme 
events are the increases in wind speed in 0, Q j but also 
stock market crashes [H, [l^ which consist of sudden 
decreases. 

We define our extreme event by an increment Xn+i — Xn 
exceeding a given threshold 77 



(17) 



where Xn and Xn+i denote the observed values at two 
consecutive time steps and the event magnitude 77 is again 
measured in units of the standard deviation. 

Since the first part of the increment Xn can be used 
as a precursory variable, the definition of the event as 
an increment introduces a correlation between the event 
and the precursory variable x„. Hence the prediction of 
increments in random numbers provides a simple, but not 
unrealistic example which allows us to study the influence 



of the distribution of the underlying process on the event- 
magnitude dependence of the quality of prediction. 

In the examples which we study in this section the joint 
PDF of precursory variable and event is known and we 
can hence evaluate 0(77, x„) analytically. A mathematical 
expression for a filter which selects the PDF of our ex- 
treme events out of the PDFs of the underlying stochastic 
process can be obtained through applying the Heaviside 
function 0(x„+i — Xn — 77) to the joint PDF. This method 
is described in more detail in the appendix. 

Since in most cases the structure of the PDF is not 
known analytically, we are also interested in evaluating 
c{r],Xn) numerically. In this case the approximations of 
the total probability and the likelihood are obtained by 
"binning and counting" and their numerical derivatives 
are evaluated via a Savitzky-Golay filter [13, UB]. The 
numerical evaluation is done within 10'' data points. In 
order to check the stability of this procedure, we evaluate 
0(77, Xn) also on 20 bootstrap samples which are generated 
from the original data set. These bootstrap samples con- 
sist of 10^ pairs of event and precursory variable, which 
were drawn randomly from the original data set. Thus 
their PDFs are slightly different in their first and second 
moment and they contain different numbers of events. 
Evaluating 0(77, a;„) on the bootstrap samples thus shows 
how sensitive our numerical evaluation procedure is to- 
wards changes in the numbers of events. This is especially 
important for large and therefore rare events. 

In order to check the results obtained by the evalua- 
tion of c(77, Xn), we compute also the corresponding ROCs 
analytically and numerically. 
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FIG. 1: The condition c(r;, x„) for the Gaussian distribu- 
tion as given by Eq. (|2ip . The color shaded regions indicate 
the intervals [—orj,—ri/2\ for which we can according to Eq. 
1231 expect c{rj,Xn) to be positive. If x„ < —o-q, r] > 2-^7? 
and terms of the order of exp(— (a;„ -I- cnff') are sufficiently 
small, the condition is also positive. If terms of the order 
of exp(— (a;„ -|- ar])^) cannot be neglected one also might find 
small regions in (— oo, —err;] for which c{ri,x„) < 0. However, 
the infiuence of these regions is neglectable, since our alarm 
interval is defined as [— oo,(5] which implies an averaging over 
several possible values of the precursory variable. 



FIG. 2; Comparison of the numerically evaluated condition 
c{r], x„) for the Gaussian distribution and the expression given 
by Eq. (|21[) . The black curves denote the evaluation of the 
analytic result in Eq. (|21|l . the curves plotted with lines and 
symbols represent the numerical results obtained from the 
original data set, and the dashed lines represent the results 
obtained from the corresponding bootstrap samples. The gray 
(green in the colored plot) regions indicate the regime —arj < 
x„ < —(777/2 for which c(r;, Xn) is positive in the limit 77 — > 00. 
The numerical evaluation of c{ri,x„) was done by sampling 
the likelihood and the total probability of events from 10^ 
random numbers. 



Note that for both, the numerical evaluation of the 
condition and the ROC curves, we used only event mag- 
nitudes ij for which we found at least 1000 events, so that 
the observed effects are not due to a lack of statistics of 
the large events. 



of the interval [—00, i5]. The total probability to find in- 
crements of magnitude rj is given by 



P{r]) = -erfc(7y/2). 



(20) 



Gaussian distributed random numbers 



Hence the condition in Eq. (|16p reads 



In the first example we assume the sequence of i.i.d. 
random numbers which form our time series to be nor- 
mal distributed. As we know from Q, increments within 
Gaussian random numbers are better predictable, the 
more extreme they are. In this section we will show that 
their PDFs fulfill also the condition in Eq. HI]). Ap- 
plying the filter mechanism developed in the appendix 
we obtain the following expressions for the a posteriori 
PDFs 



Pcixn,ri) 



exp 



2V27rcrP(?7) 



erfc 



at] 



crV2 



and the likelihood 



L(77,x„) = -crfc(^— ^ 



(18) 



(19) 



We recall that the optimal precursor is given by the value 
of Xn which maximizes the likelihood. We refer to this 
special value of the variable a:„ by x. 



pre 



and find for the 
—00. Thus, in- 



likelihood according to Eq. ([T 
stead of a finite alarm volume S here is the upper limit 



2 exp ( 

TT 



erfc (z) 



+ 



1 exp(^-tj (1 - lerfc (z)) 



a/tF erfc (I 
with z 



- arj 



V2ct 



(21) 



Fig. [T] illustrates this expression and Figure [2] compares 
it to the numerical results. For the ideal precursor 
Xn — Xpre = — OO the conditiou c{rj,Xn) is — according 
to Eq. (|2ip — zero, since in this case, the slope of the 
ROC-curve tends to infinity [4] and does not react to 
any variation in 77. For any finite value of the precursory 
variable a;„ < we have to distinguish three regimes of 
z = {xn -\- o'77)/'\/2ct, namely, z — > 00 or z ^ —00 and 
finally also the case z = 0. 

In the first case we study the behavior of 0(77, x„) for 
a fixed value of the precursory variable —arj < Xn and 
7y — > 00. This implies that z ^ 00 and we can use the 
asymptotic expansion for large arguments of the comple- 
mentary error function 
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asymptotic expression of c{r], Xn) hence reads 



0.2 0.4 0.6 0.8 1 
rate of false alarms 



FIG. 3: ROCs for Gaussian distributed i.i.d. random vari- 
ables. The symbols represent ROC curves which where made 
via predicting increments in 10^ normal i.i.d. random num- 
bers. The predictions were made according to the prediction 
strategy described in Sec. Ill Al The lines represent the results 
of evaluating the integrals in Eqs. (|6]) and ([71 for the Gaussian 
case. Note that the quality of the prediction increases with 
increasing event magnitude. 



erfc(z) 



1 • 3.. .(2m- 1) 



\ rn— 1 

I I 37r 
oo, |argz| < — 



(2z2) 



2\7n 



(22) 



which can be found in [19] to obtain 



, s Xn I/ 

C(r],Xn) OC h -, 

(T Z 



-arj < Xn < 0. (23) 



This expression is appropriate for Xn > —crrj since the 
asymptotic expansion in Eq. (|22p holds only if the argu- 
ment of the complementary error function is positive. In 
this case c{rj,Xn) is larger than zero, if x„ is fixed and 
finite and —arj < a;„ < —ar\l2. 

In the second case, we assume 77 3> 1 to be fixed, x„ < 
—ar\ and Xn —00. Hence we can use the expansion in 
Eq. ((22I) only to obtain the asymptotic behavior of the 
dependence on 77 and not for the dependence on z. An 



c{ri,Xn) OC 



2(l-ierfc(77/2)) 
-O (exp(-z^)) , X, 



I < -<yr] 



(24) 



Since erf(2;) tends to minus unity as z — s- —00 the ex- 
pression in Eq. ((24|) is positive if 77 > 2y/Tr and if we 
can assume the squared exponential term to be suffi- 
ciently small. If the later assumption is not fuUfilled 
one might observe some regions of intermediate values 
of —00 < Xn < —(Jri, for which c{ri, x) is negative. 

However the ROC curves in Fig. [3] suggest that the in- 
fluence of these regions is sufficiently small, if the alarm 
volume is chosen to be [— 00, i5]. We can understand this 
effect, if we keep in mind that we use the interval [—00, S\ 
as an alarm volumen. Hence we can expect that the in- 
fluence of the regions, where c(?7,a;„) is negative, is sup- 
pressed since we average over many different values of 
Xn and the condition is positive (Positive 
is meant here in the sense, that c{rj,Xn) approaches the 
value zero for Xn = —00 from small positive numbers.) 

In the third case, for Xn = —err/ and hence z = we 

find that 0(77, Xn) is positive if > 2y^ (l — ierfc(77/2)) . 

In total we can expect larger increments in Gaussian 
random numbers to be easier to predict the larger they 
are. The ROCs in Fig. [3] support these results. 



B. Symmetrized exponential distributed random 
variables 



The PDF of the symmetrized exponential reads 



■ exp(-Ax„) 



A/2 



■ exp(Aa;„) 



Xn > 0, 
Xn — 0, 
Xn < 0, 



with ^ = 0, cr = \/2/A. 

Applying the filtering mechanism according to the ap- 
pendix we find the joint PDFs of precursory variable and 
event 



j{xn, (Yn+iiv) = 1)) = 



I exp(-\/277 - 2Aa;„) 

I (exp(Ax„) - i exp (\/2r/ + X2xn)) 



Xn > 0, 

-77 < Xn < 0, 
Xn <-ri < 0, 



(25) 
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FIG. 4: The numerically and analytically evaluated con- 
dition for the symmetrized exponential. The black line is 
the result of the analytical evaluation according to Eq. (|3ip . 
the curves plotted with lines and symbols represent the nu- 
merical results obtained from the original data set, and the 
dashed lines represent the results obtained from the corre- 
sponding bootstrap samples. Note that for small values of 
Xn the condition c(ry, Xn, A) is for all values of close to zero. 
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0.2 0.4 0.6 0.8 1 
rate of false alarms 

FIG. 5: The ROCs for symmetrically exponentially dis- 
tributed i.i.d. random numbers show no significant depen- 
dence on the event magnitude. The ROC curves were made 
via predicting increments in 10^ normal i.i.d. random num- 
bers and the predictions were made according to the predic- 
tion strategy described in Sec. Ill Al The black line indicates 
the analytically evaluated ROC curve for r; = 0. 



the aposterior probabilities, 



p/(x„,?7, A) 



(2- 




A 


(2- 






A 


(2- 





< 4 



2 exp(V2r7 + A.t„) - exp {2V2r] + 2Xxn)) 

2 e^P("^^") (1^1(1+1) exp(-V2,)) ^">0' 
cxp(-\/2r;)) 



Xn > 0, 

-7] <Xn < 0, 

Xn < -ri <0, 



I exp 



cxp{2\x„ + V2ri) 



rj < Xn < 0, 
Xn < -r] < 0, 



(26) 



(27) 



the likelihood 



L{r],Xn,\) 



\ exp(-\/2r/ - Xxn) 
\ exp(-%/2r/ - Xxn) 
1 - i exp(V2ry + Aa;„) 



Xn > 0, 

-r] <Xn < 0, 

Xn < -r] < 0, 



(28) 



and the total probability to find events of magnitude 77 



: exp 



(-V2ry) 



Xn > 0, 

-77 < a;„ < 0, (29) 

Xn < -'7 < 0. 



jexp(-V277) 

If we are not interested in the range of the precursory 



I 

variable, the total probability to find events is given by 

P(r;)-iexp(-V2,7)(^l + -^^. (30) 
Hence the condition c(?7, Xn, A) reads 



J 
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c{r],Xn,X) = < 



. -75 exp(Aa;„ + V2?7) ( i_ . + i_3,,p\_^^) ) , < -77. 



(31) 



Figure [4] compares the results of the numerical evalua- 
tion of the condition and the analytical expression given 
by Eq. (j3ip . Since most precursors of large increments 
can be found among negative values, the numerical eval- 
uation of c(r],XnX) becomes worse for positive values of 
Xn , since in this limit the likelihood is not very well sam- 
pled from the data. This leads also to the wide spread of 
the bootstrap samples in this region. 

Figure m shows that in the vicinity of the smallest value 
of the data set, the condition c{ri,Xn,X) is zero. As we 
approach larger values of 77, c(?7, a;„,A) approaches zero 
in the whole range of data values. That is why we would 
expect to see no influence of the event magnitude on the 
quality of predictions in the exponential case. 

The ROC curves in Fig. \E\ support these results. The 
numerical ROC curves were made via predicting incre- 
ments in 10^ normal i.i.d. random numbers according to 
the prediction strategy described in Sec. Ill Al The pre- 
cursor for the ROC-curves is chosen as the maximum of 
the likelihood according to Eq. (pS)) . 
so that the alarm interval is [cxd,(5]. In summary there 
is no significant dependence on the event magnitude for 
the prediction of increments in a sequence of symmetrical 
exponential distributed random numbers. 




FIG. 6: The condition c(»7, a;„, Xmin, k) for the power-law dis- 
tribution with lower endpoint Xmin = 0.01 are plotted for 
constant values of the precursory variable Xn. The sym- 
bols represent the results of the numerical evaluation of 
c{'q,x„,Xmin,k), the gray (colored) lines denote the ana- 
lytic results, and the black lines denote the result for the 
corresponding bootstrap samples and the optimal precursor. 
For the "ideal" precursor Xn = Xmin = 0.01 all values of 
c(77, k, 0.01) are negative. Hence one should expect smaller 
events to be better predictable. However, this effect is sensi- 
tive of the choice of the precursor. 



conditional PDFs of the increments: 



C. Pareto distributed random variables 



We investigate the Pareto distribution as an example 
for power-law distributions. The PDF of the Pareto dis- 
tribution is defined as 1201 



p{x) — fcs^^jn X ^ 



(32) 



for X G [xmimoo) with the exponent A: > 3, the lower 
endpoint Xmin > 0, and variance a — \J'^2- Filter- 
ing for increments of magnitude rj we find the following 



kx 



2k 
rain 



(33) 



kxZ 



fc-l-l Y fc-2 



l-P(r;,/c) 



(34) 
(35) 



fc+l V fe-2 



Within the range {xmimOo) the likelihood has no well 
defined maximum. However, since the likelihood is a 
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FIG. 7: ROC-plot for the power-law distribution with k — 3 
and Xmin = 0.01. The symbols show the numerical results 
and the lines indicate the analytically calculated ROC curves. 
The ROC curves were made via predicting increments in 10^ 
Pareto distributed i.i.d. random numbers. The predictions 
were made according to the prediction strategy described in 
Sec. Ill Al Note that we tested only event magnitudes rj, for 
which we found at least 1000 events, so that the effects we 
observe are not due to a lack of statistics of the large events. 
The ROC curves display that in Pareto distributed i.i.d. ran- 
dom numbers with the lower endpoint Xmin = 0.01 smaller 
events are better predictable and that large events are very 
hard to predict. 



monotonously decreasing function, we use the lower end- 
point Xmin as E prccuisoi. The total probability to find 
events of magnitude ry is given by 

(36) 



FIG. 8: ROC-plot for the power-law distribution with k — 9 
and Xmin = 0.01. The symbols show the immerical results 
and the lines indicate the analytically calculated ROC curves. 
The ROC curves where made via predicting increments in 10^ 
Pareto distributed i.i.d. random numbers and the predictions 
were made according to the prediction strategy described in 
Sec. Ill Al The ROC-curves display that in Pareto distributed 
i.i.d. random numbers smaller events are better predictable 
and large events are especially hard to predict. 



smaller events to be better predictable. The correspond- 
ing ROC curves in Figs. [71 and [5] verify this statement of 

In summary we find that larger events in Pareto dis- 
tributed i. i. d. random numbers are harder to predict 
the larger they are. This is an admittedly unfortunate 
result, since extremely large events occur much more fre- 
quently in power-law distributed processes than in Gaus- 
sian distributed processes. Hence, their prediction would 
be highly desirable. 



where 2-Fi(a, b, c, x) denotes the hypergeometric function 
P2Fq{a, b, c, x) with p — 2, q — 1. 
Using 



dP{r],k) 
dr] 



V I k 
fe+1 V fe-2 



(37) 



and inserting the expressions ([55)1 and ([55)1 for the com- 
ponents of c(?7, Xn, Xmin, k) wc Can obtain an explicit an- 
alytic expression for the condition. In Fig. [H] we evaluate 
this expression using Mathematica and compare it with 
the results of an empirical evaluation on the data set of 
lO'^ i.i.d. random numbers. 

Figure ([6]) displays that the value of c{r], Xn, Xmin, k) 
depends sensitively on the choice of the precursor. For 
the ideal precursor Xpre = Xmin all values of c(?7, k, Xmin) 
are negative. Hence one should in this case expect 



IV. INCREMENTS IN FREE JET DATA 

In this section, we apply the method of statistical in- 
ference to predict acceleration increments in free jet data. 
Therefore we use a data set of 1.25 x 10^ samples of the 
local velocity measured in the turbulent region of a round 
free jet 2]J. The data were sampled by a hot-wire mea- 
surement in the central region of an air into air free jet. 
One can then calculate the PDF of velocity increments 
o-7i,k — ^ri+fc ^ where u„ and w,i+fc are the velocities 
measured at time step n and n + k. The Taylor hypoth- 
esis allows one to relate the time-resolution to a spatial 
resolution [2l| . One observes that for large values of k the 
PDF of increments is essentially indistinguishable from a 
Gaussian, whereas for small fc, the PDF develops approx- 
imately exponential wings [13, HI, [131 ■ ^^S- M illustrates 
this effect using the data set under study. Thus the in- 
cremental data sets a„^fc provides us with the opportunity 
to test the results for statistical predictions within Gaus- 
sian and exponential distributed i.i.d. random numbers 
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FIG. 9; PDF of the increments a„_fc — v^+k ~ with 
A: = 1, 3, 10, 35, 144, 285. The black lines correspond to Gaus- 
sian and exponential PDFs with appropriate values for the 
standard deviation or the coefBcient A. 

on a data set, which exhibits correlated structures. 

We are now interested in predicting increments of the 
acceleration an+j,k^o,n,k > rj in the incremental data sets 
o-n,k = Vn+k — Vn- In the following we concentrate on the 
incremental data set a„_io, which has an asymptotically 
exponential PDF and the data set a„^i44, which has an 
asymptotically Gaussian PDF. Furthermore we focus on 
increments between relatively large time steps, i.e., j = 
285, so that the short-range persistence of the process 
does not prevent large events from occuring. As in the 
previous sections we are hence exploiting the statistical 
properties of the time series to make predictions, rather 
than the dynamical properties. 

We can now use the evaluation algorithm which was 
tested on the previous examples to evaluate the condition 
for these data sets. The results are shown in Fig.[TOl We 
find that at least for larger values of rj the main features 
of c{xn , T]) for the exponential and the Gaussian case as 
described in Sec. 1111 Al and 1111 Bl are also present in the 
free jet data. For larger values of ry, c{an,k,v) is either 
larger than zero in the Gaussian case {k = 144) or equal 
to zero in the exponential case (fc = 10) in the region of 
interesting precursory variables, i.e., small values of a„ fc. 

However, the presence of the exponential and the 
Gaussian distributions is more prominent in the corre- 
sponding ROC curves. For the free jet data set, the 
predictions were made with an algorithm similar to the 
one described in Sec. Ill Al Instead of a specific pre- 
cursory structure, which corresponds to the maximum of 
the likelihood, we use here a threshold of the likelihood 
as a precursor. In this setting we give an alarm for an 
extreme event, whenever the likelihood that an extreme 
event follows an observation is larger than a given thresh- 
old value. 

In the exponential case (fc — 10) shown in Fig. [TlTa') 
the ROG curves for different event magnitude r/ almost 
coincide, although the range of rj is larger {rj G (0,6.71)) 



than in the Gaussian case shown in Fig. [TT] (b). For k = 
144 the ROC curves are further apart, which corresponds 
to the results of Sees. IIII Al and llllBI 

This example of the free jet data set shows that the 
specific dependence of the ROC curve on the event mag- 
nitude can also in the case of correlated data sets be 
characterized by the PDF of the underlying process. 



V. CONCLUSIONS 

We study the magnitude dependence of the quality of 
predictions for increments in a time series which consists 
in sequences of i.i.d. random numbers and in acceleration 
increments measured in a free jet fiow. Using the first 
part of the increment Xn as a precursory variable we pre- 
dict large increments Xn+i — Xn via statistical considera- 
tions. In order to measure the quality of the predictions 
we use ROC curves. Furthermore we introduce a quan- 
titative criterion which can determine whether larger or 
smaller events are better predictable. This criterion is 
tested for time series of Gaussian, exponential and Pareto 
i.i.d. random variables and for the increments of the ac- 
celeration in the free jet flow. The results obtained from 
the criterion comply nicely with the corresponding ROC- 
curves. Note that for both, the numerical evaluation of 
the condition and the ROC-plots, we used only event 
magnitudes rj for which we found at least 1000 events, so 
that the observed effects are not due to a lack of statistics 
of the large events. 

In the sequence of Gaussian i.i.d. random numbers, 
we find that large increments are better predictable the 
larger they are. In the Pareto distributed time series we 
observe that in slowly decaying power laws larger events 
are harder to predict, the larger they are. We find no 
significant dependence on the event-magnitude for the 
sequence of exponentially i.i.d. random numbers. 

While the condition can be easily evaluated analyti- 
cally, it is not that easy to compute numerically from 
observed data, since the calculation implies evaluating 
the derivatives of numerically obtained distributions. Us- 
ing Savitzky-Golay filters improved the results, but espe- 
cially in the limit of larger events, where the distributions 
are difficult to sample, one cannot trust the results of 
the numerically evaluated criterion. However, it is still 
possible to apply the criterion by fitting a PDF to the 
distribution of the underlying process and then evaluate 
the criterion analytically. 

Although the magnitude dependence of the quality of 
predictions was observed in different contexts and for dif- 
ferent measures of predictability, in this contribution only 
ROC curves were used. In order to exclude the possibility 
that the effect is specific to the ROC curve, future works 
should also include other measures of predictability. 

Reviewing the results for the Gaussian case and the 
slowly decaying power law from a philosophic point of 
view one can conclude that nature allows us to predict 
large events from the most frequently occuring distribu- 
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3n,k ^n,k 

FIG. 10: Transition from the exponential regime (a) to the Gaussian regime (b) characterized via the numerical evaluation of 
c{x„,rj). The black line corresponds to the analytic results for the Gaussian and the exponential PDF, fitted to the PDFs of 
the increments, as it is shown in Fig.|9] For larger values of rj the main features of c{xn,rj) for the exponential and the Gaussian 
case as described in Sec. IIII Al and UlI Bl are reproducable. For larger values of rj we find that if —arj < a„^k — or\l2 c{an,ri) is 
either larger than zero in the asymptotically Gaussian case (fc — 144) or equal to zero in the asymptotically exponential case 
(fe = 10). 




rate of false alarms rate of false alarms 



FIG. 11: Transition from exponential ROC curves (a) to Gaussian ROC curves (b). In the exponential case (fe = 10), shown 
in (a) the ROC curves for different event magnitude 77 are almost the same, although the range of rj is larger {rj G (0, 6.71)) 
than in the Gaussian case shown in (b). For k = 144 the ROC curves are further apart, which corresponds to the results for 
Gaussian ROC curves (see Sec. IIII A[l 



tion easily. However in Gaussian distributions very large 
events are rare and therefore less likely to cause dam- 
age. Whereas in the less frequently occurring distribu- 
tions with heavy power-law tails, large events are espe- 
cially hard to predict. Therefore one can assume, that 
rare large impact events of processes with power-law dis- 
tributions will remain unpredictable, although their pre- 
diction would be highly desirable. 
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APPENDIX A: OBTAINING THE ANALYTIC 
EXPRESSION FOR THE LIKELIHOOD, THE 
JOINT AND THE APOSTERIOR PDFS FOR 
INCREMENTS IN STOCHASTICAL PROCESSES 



An analytic expression for a filter which selects the 
PDF of our extreme increments a;„+i — a;„ > d out 
of the PDFs of the underlying stochastic process can 
be obtained through the Heaviside function Q{xn+i — 
Xn — d). (Note that d is not scaled by the standard 
deviation, i.e., d = arj.) This filter is then applied to 
the joint PDF j{xo,xi, ...,x„-k+i,Xn-k+2, --^Xn) of a 
stochastic process or to be more precise to the likelihood 
L{xn+i\xQ,xi, ...,Xn_k+i,Xn_k+2, --^Xn) that the n -|- 1 
step follows the previously obtained values. If we condi- 
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tion only on the last k values, we neglect the dependence 
on the past. The likelihood that an event Y(d)=l follows 

I 



If the resulting expression is nonzero, the condition of the 
extreme event in Eq. (jl7[) is fulfilled and for Xn^i and x„ 
the following relation holds: 

Xn+i ^ Xn + d + j (7 e M, 7 > 0) . (A2) 

I 



Hence the joint PDF, the aposterior PDF and the total 
probability to find increments are given by 

I 



j(x(„-fc+i,„),y„+i(d) = 1) 

P(x(„-fe+l,„)|>"n+l(d) = 1) 

PiY{d) = 1) 



Whether we can acess a given stochastical process an- 
alytically or not depends on the question of whether the 
integrals in Eq. (|A6p can be solved or not. 

If we are interested in the prediction of threshold cross- 
ings instead of increments, we can interpret 77 as the mag- 
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